Performance Characterization of the 64-bit x86 Architecture from Compiler Optimizations' Perspective
نویسندگان
چکیده
Intel Extended Memory 64 Technology (EM64T) and AMD 64-bit architecture (AMD64) are emerging 64-bit x86 architectures that are fully x86 compatible. Compared with the 32-bit x86 architecture, the 64-bit x86 architectures cater some new features to applications. For instance, applications can address 64 bits of virtual memory space, perform operations on 64-bit-wide operands, get access to 16 general-purpose registers (GPRs) and 16 extended multi-media (XMM) registers, and use a register-based argument passing convention. In this paper, we investigate the performance impacts of these new features from compiler optimizations’ standpoint. Our research compiler is based on the Intel Fortran/C++ production compiler, and our experiments are conducted on the SPEC2000 benchmark suite. Results show that for 64-bit-wide pointer and long data types, several SPEC2000 C benchmarks are slowed down by more than 20%, which is mainly due to the enlarged memory footprint. To evaluate the performance potential of 64-bit x86 architectures, we designed and implemented the LP32 code model such that the sizes of pointer and long are 32 bits. Our experiments demonstrate that on average the LP32 code model speeds up the SPEC2000 C benchmarks by 13.4%. For the register-based argument passing convention, our experiments show that the performance gain is less than 1% because of the aggressive function inlining optimization. Finally, we observe that using 16 GPRs and 16 XMM registers significantly outperforms the scenario when only 8 GPRs and 8 XMM registers are used. However, our results also show that using 12 GPRs and 12 XMM registers can achieve as competitive performance as employing 16 GPRs and 16 XMM registers.
منابع مشابه
A Space-Aware AMD64 Port of Jikes RVM
As computers attempt to work with larger and more complex datasets, the need for 64-bit computing is becoming more acute. Many applications, such as video editing and large scale databases, are reaching the limits of addressable memory in 32-bit computers. These new 64-bit architectures are dependent on compilers to produce equivalent or better code. Here we look at the issues in porting the Ji...
متن کاملExtending a Compiler Backend for Complete Memory Error Detection
Technological advances drive hardware to ever smaller feature sizes, causing devices to become more vulnerable to faults. Applications can be protected against errors resulting from faults by adding error detection and recovery measures in software. This is popularly achieved by applying automatic program transformations. However, transformations applied to intermediate program representations ...
متن کاملIa-64 Code Generation Electrical and Computer Engineering Biographical Sketch 2 Prior Work 8 3 the Ia-64 Processor Architecture 17
Vikram Rao. IA-64 code generation. (Under the direction of Dr. Tom Conte). This work presents an approach to code generation for a new 64-bit Explicitly Parallel Instruction Computing (EPIC) architecture from Intel, called IA-64. The major contribution of this work is the design of a machine independent optimizer, munger, that transforms code generated originally for a Very Long Instruction Wor...
متن کاملIa-64 Code Generation Electrical and Computer Engineering Biographical Sketch
Rao, Vikram. IA-64 code generation. (Under the direction of Dr. Tom Conte). This work presents an approach to code generation for a new 64-bit Explicitly Parallel Instruction Computing (EPIC) architecture from Intel, called IA-64. The major contribution of this work is the design of a machine independent optimizer, munger, that transforms code generated originally for a Very Long Instruction Wo...
متن کاملMAO - An extensible micro-architectural optimizer
Performance matters, and so does repeatability and predictability. Today’s processors’ micro-architectures have become so complex as to now contain many undocumented, not understood, and even puzzling performance cliffs. Small changes in the instruction stream, such as the insertion of a single NOP instruction, can lead to significant performance deltas, with the effect of exposing compiler and...
متن کامل